feat: Enable evaluators to run after resume in eval runtime #1127

Chibionos · 2026-01-15T18:20:19Z

Summary

Enables evaluators to execute and produce scores after the agent completes following a resume operation.

Changes

1. Consistent Thread ID for Checkpointing

Changed from (unique per run) to (consistent across suspend/resume)
This allows LangGraph checkpoints to be found when resuming from suspended state
Without this, resume would start with a new thread_id and fail to find the checkpoint

2. Resume Mode Implementation

When --resume flag is set, pass Command(resume=data) to continue from interrupt() point
Uses mock resume data for testing: {"status": "completed", "result": "mock_completion_data"}
In production, orchestrator provides actual result data from external work (RPA process, human input, etc.)

3. Evaluator Execution

Previously: Evaluators skipped during suspend (correct) but also couldn't run after resume
Now: Agent completes after resume → evaluators run on final output → scores generated

Testing

Before (evaluators don't run on resume):

rm -rf __uipath/state.db
uipath eval agent-simple evaluations/eval-sets/test.json  # Suspends
uipath eval agent-simple --resume  # Agent re-suspends, evaluators still skipped

After (evaluators run and produce scores):

rm -rf __uipath/state.db
uipath eval agent-simple evaluations/eval-sets/test.json  # Suspends
uipath eval agent-simple --resume  # Agent completes, evaluators run ✓

Related PRs

feat: pack uv.lock #414 (uipath-langchain-python) - Sample demonstrating suspend/resume
TBD (uipath-agents-python) - Integration testing

Architecture

SUSPEND PHASE (Resume mode: False)
  eval_item.id → runtime_id → thread_id
  Agent suspends at interrupt()
  Checkpoint saved with thread_id=eval_item.id
  Evaluators skipped ✓

RESUME PHASE (Resume mode: True) 
  eval_item.id → runtime_id → thread_id (same as suspend!)
  Checkpoint found using thread_id
  Command(resume=data) passed to interrupt()
  Agent completes execution
  Evaluators run on final output ✓

Notes

Mock resume data is used for testing; production orchestrator provides actual data
Backward compatible: non-suspend scenarios unaffected
Thread ID consistency maintained per eval_item across suspend/resume cycles

Adds support for suspending and resuming evaluations that invoke RPA processes. When an evaluation suspends while waiting for an external job, it can now be resumed after the job completes. Changes: - Added SUSPENDED status detection after agent execution - Added --resume flag to 'uipath eval' command - Skip evaluator execution for suspended runs (evaluators run on resume) - Pass triggers through evaluation flow to enable resume - Added comprehensive logging for suspend/resume debugging Testing done with tool-calling-suspend-resume sample in uipath-langchain-python PR #414.

This is a critical fix for serverless executor integration. Problem: - Inner runtime (agent) returns SUSPENDED status when interrupt() is called - Evaluation runtime was hardcoding SUCCESSFUL status in the result - Serverless executor sees SUCCESSFUL and doesn't suspend the job - State is not saved, resume cannot work Solution: - Check all evaluation run results for SUSPENDED status - Propagate SUSPENDED to top-level UiPathRuntimeResult - Also handle FAULTED status propagation (FAULTED > SUCCESSFUL, SUSPENDED > FAULTED) This ensures the serverless executor: - Detects SUSPENDED status correctly - Saves checkpoint to blob storage - Saves trigger to SQL database - Suspends the job properly - Can resume when trigger completes Addresses feedback from @cristian-pufu in PR review.

The check 'if overall_status != UiPathRuntimeStatus.SUSPENDED' was redundant because we break immediately when SUSPENDED is found, so overall_status can never be SUSPENDED at the FAULTED check point. Simplified logic: - SUSPENDED: set and break (highest priority) - FAULTED: set and continue (in case later eval is SUSPENDED) - SUCCESSFUL: default This makes the priority explicit: SUSPENDED > FAULTED > SUCCESSFUL

Changes in this release: - Fix: Propagate SUSPENDED status from inner runtime to evaluation result - Fix: Remove redundant condition in status propagation logic - Feat: Add --resume flag for eval command - Feat: Add comprehensive logging for suspend/resume flow - Docs: Add interrupt/suspend/resume architecture documentation

- Use eval_item.id as runtime_id (thread_id) for consistent checkpointing across suspend and resume invocations - When --resume flag is set, pass Command(resume=data) to continue from interrupt() point instead of starting fresh - Mock resume data for testing; production orchestrator provides actual result data from external work (RPA, HITL, etc.) - This allows evaluators to execute and produce scores after agent completes post-resume Fixes the issue where evaluators were not running in resume mode.

Chibi Vikram added 5 commits January 15, 2026 07:38

github-actions bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Jan 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Enable evaluators to run after resume in eval runtime #1127

feat: Enable evaluators to run after resume in eval runtime #1127

Chibionos commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Enable evaluators to run after resume in eval runtime #1127

Are you sure you want to change the base?

feat: Enable evaluators to run after resume in eval runtime #1127

Conversation

Chibionos commented Jan 15, 2026

Summary

Changes

1. Consistent Thread ID for Checkpointing

2. Resume Mode Implementation

3. Evaluator Execution

Testing

Before (evaluators don't run on resume):

After (evaluators run and produce scores):

Related PRs

Architecture

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant